March 2023

Differential Gene Expression Analysis Workflow


Fastq file format

Fastq file format - Headers

Fastq file format - Sequences

Fastq file format - Third line

Fastq file format - Quality Scores

(Phred) Quality Scores

Sequence quality scores are transformed and translated p-values

  • Sequence bases are called after image processing (base calling)
    • Each base in a sequence has a p-value associated with it
    • p-values range from 0-1 (e.g.: 0.05, 0.01, 1e-30)
    • p-value of 0.01 inferred as 1 in 100 chance that called base is wrong

QC is important

At every stage we should check for any problems before we put time and effort into analysing potentially bad data

  • Start with FastQC on our sequencing files
    • Quick
    • Outputs an easy to read html report

Running FastQC

We run fastQC from the terminal with the command

fastqc <path/to/fastq/files>

but there are lots of other parameters which you can find to tailor your QC by typing

fastqc -h

Per base sequence quality

Good Data

Bad Data

Per base sequence content

Good Data

Bad Data

Per sequence GC content

Good Data

Bad Data

Adaptor content

Good Data

Bad Data